High-dimensional Principal Component Analysis
نویسندگان
چکیده
High-dimensional Principal Component Analysis by Arash Ali Amini Doctor of Philosophy in Electrical Engineering University of California, Berkeley Associate Professor Martin Wainwright, Chair Advances in data acquisition and emergence of new sources of data, in recent years, have led to generation of massive datasets in many fields of science and engineering. These datasets are usually characterized by having high dimensions and low number of samples. Without appropriate modifications, classical tools of statistical analysis are not quite applicable in these “high-dimensional” settings. Much of the effort of contemporary research in statistics and related fields is to extend inference procedures, methodologies and theories to these new datasets. One widely used assumption which can mitigate the effects of dimensionality is the sparsity of the underlying parameters. In the first half of this thesis we consider principal component analysis (PCA), a classical dimension reduction procedure, in the high-dimensional setting with “hard” sparsity constraints. We will analyze the statistical performance of two modified procedures for PCA, a simple diagonal cut-off method and a more elaborate semidefinite programming relaxation (SDP). Our results characterize the statistical complexity of the two methods, in terms of the number of samples required for asymptotic recovery. The results show a trade-off between statistical and computational complexity. In the second half of the thesis, we consider PCA in function spaces (fPCA), an infinite-dimensional analog of PCA, also known as Karhunen–Loéve transform. We introduce a functional-theoretic framework to study effects of sampling in fPCA under smoothness constraints on functions. The framework generates high dimensional models with a different type of structural assumption, an “ellipsoid” condition, which can be thought of as a soft sparsity constraint. We provide a M -estimator to estimate principal component subspaces which takes the form of a regularized eigenvalue problem. We provide rates of convergence for the estimator and show minimax optimality. Along the way, some problems in approximation theory are also discussed.
منابع مشابه
Combined Unfolded Principal Component Analysis and Artificial Neural Network for Determination of Ibuprofen in Human Serum by Three-Dimensional Excitation–Emission Matrix Fluorescence Spectroscopy
This study describes a simple and rapid approach of monitoring ibuprofen (IBP). Unfolded principal component analysis-artificial neural network (UPCA-ANN) and excitation-emission spectra resulted from spectrofluorimetry method were combined to develop new model in the determination of IBF in human serum samples. Fluorescence landscapes with excitation wavelengths from 235 to 265 nm and emission...
متن کاملCombined Unfolded Principal Component Analysis and Artificial Neural Network for Determination of Ibuprofen in Human Serum by Three-Dimensional Excitation–Emission Matrix Fluorescence Spectroscopy
This study describes a simple and rapid approach of monitoring ibuprofen (IBP). Unfolded principal component analysis-artificial neural network (UPCA-ANN) and excitation-emission spectra resulted from spectrofluorimetry method were combined to develop new model in the determination of IBF in human serum samples. Fluorescence landscapes with excitation wavelengths from 235 to 265 nm and emission...
متن کاملMethods for regression analysis in high-dimensional data
By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...
متن کاملIdentification of mineralization features and deep geochemical anomalies using a new FT-PCA approach
The analysis of geochemical data in frequency domain, as indicated in this research study, can provide new exploratory informationthat may not be exposed in spatial domain. To identify deep geochemical anomalies, sulfide zone and geochemical noises in Dalli Cu–Au porphyry deposit, a new approach based on coupling Fourier transform (FT) and principal component analysis (PCA) has beenused. The re...
متن کاملAn Empirical Comparison between Grade of Membership and Principal Component Analysis
t is the purpose of this paper to contribute to the discussion initiated byWachter about the parallelism between principal component (PC) and atypological grade of membership (GoM) analysis. The author testedempirically the close relationship between both analysis in a lowdimensional framework comprising up to nine dichotomous variables and twotypologies. Our contribution to the subject is also...
متن کاملCompression of Breast Cancer Images By Principal Component Analysis
The principle of dimensionality reduction with PCA is the representation of the dataset ‘X’in terms of eigenvectors ei ∈ RN of its covariance matrix. The eigenvectors oriented in the direction with the maximum variance of X in RN carry the most relevant information of X. These eigenvectors are called principal components [8]. Ass...
متن کامل